SMS Spam Detection and Sender Blocking using Machine Learning

Authors: Miss. R. Vennela , Mrs. Jennifer Mary S

DOI Link: https://doi.org/10.22214/ijraset.2025.74174

Abstract

With the rapid growth of mobile communication, SMS spam has become a significant challenge, leading to financial frauds, phishing attacks, and privacy breaches. A machine learning-based SMS spam detection system with sender blocking features is proposed in this paper. To categorize communications as safe, suspicious, or spam, the system uses text preprocessing, TF-IDF feature extraction, and the Naïve Bayes classifier. To enhance security, the system issues warnings and blocks senders after repeated spam attempts. Results from experiments demonstrate how effective the strategy is in reducing spam and protecting users from malevolent senders.

Introduction

With the rise of mobile phone usage, SMS (Short Message Service) has become a widely used communication tool. However, this has led to a surge in SMS spam—irrelevant and often fraudulent messages. The limited computational resources of mobile devices make it challenging to detect and filter such spam effectively.

Project Overview

This project proposes a hybrid spam detection system using Multinomial Naïve Bayes and TF-IDF feature extraction to classify SMS messages into:

Safe
Suspicious
Spam

A sender blocking mechanism is also integrated, which issues warnings and blocks senders after three offenses.

Problem Statement

SMS spam poses a threat by spreading misinformation and enabling fraudulent activities. Traditional spam filters often fail to address this adequately. This project addresses the need for a more accurate, automated, and proactive spam detection and prevention system.

Related Work

A literature survey highlights recent advancements:

Sharma et al. (2021): SVM with word embeddings
Kumar & Reddy (2022): Hybrid CNN-LSTM model
Patel et al. (2023): Naïve Bayes + TF-IDF approach
Ali & Singh (2024): Classification + sender blocking mechanism

Existing Spam Detection Methods

Standard Filtering: Rule-based filters using headers, blacklists, and user-defined rules.
Enterprise Filtering: Server-based filters with message scoring and list-based auto-blocking.
Case-Based Filtering: Machine learning-based classification with training and testing phases.

Proposed System

Main Components:

Data Preprocessing: Cleaning, stopword removal, normalization
Feature Extraction: Using TF-IDF
Classification: Using Multinomial Naïve Bayes
Sender Blocking: After three spam/suspicious messages
Visualization: Charts for performance analysis

Architecture Diagrams Included:

Flowchart of SMS processing
System architecture
UML Use Case diagram (user and admin roles)

Methodology

Dataset: 5,000 SMS messages (from Kaggle)
Phases:
- Data cleaning
- Feature extraction
- Model training/testing (80:20 split)
- Classification
- Sender blocking
- Result visualization

Testing & Results

Accuracy: 97.5%
Precision: 96.8%
Recall: 97.1%
F1-Score: 96.9%
Evaluation Tools: Confusion matrix, pie charts, bar graphs
Outcome: High reliability with minimal false positives/negatives

The system not only detects spam but also actively prevents future spam by blocking repeat senders, thereby improving user safety and trust.

Conclusion

This paper presents a machine learning-based SMS spam detection and sender blocking system. The integration of TF-IDF with Naïve Bayes ensures efficient classification, while the sender blocking mechanism enhances user safety. In the future, deep learning models such as transformers can be incorporated for improved accuracy. Additionally, deployment as a mobile app with real-time filtering will make the system more practical for end-users. In this project, a modified algorithm aimed at identifying SMS spam has been proposed. The primary objective of this endeavor is to mitigate the challenges faced by individuals and mobile users due to fraudulent messages. Extensive literature review was conducted to identify an appropriate model for addressing the problem statement. The anticipated experimental results are expected to demonstrate superior accuracy compared to traditional algorithms.

References

[1] Sharma, A., et al. \'SMS Spam Detection Using SVM and Word Embeddings,\' IEEE Access, 2021. [2] Kumar, R., & Reddy, S. \'Hybrid CNN-LSTM for Spam SMS Classification,\' Journal of AI Research, 2022. [3] Patel, V., et al. \'Naïve Bayes and TF-IDF for SMS Spam Detection,\' Springer, 2023. [4] Ali, M., & Singh, P. \'Integrated Spam Detection and Sender Blocking System,\' Elsevier, 2024.

Copyright

Copyright © 2025 Miss. R. Vennela , Mrs. Jennifer Mary S. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET74174

Publish Date : 2025-09-09

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here